Visual Reasoning


VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Add code
Dec 03, 2024
Viaarxiv icon

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

Add code
Dec 03, 2024
Viaarxiv icon

Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents

Add code
Dec 03, 2024
Viaarxiv icon

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation

Add code
Dec 03, 2024
Viaarxiv icon

Who Walks With You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory Prediction

Add code
Dec 03, 2024
Viaarxiv icon

Understanding the World's Museums through Vision-Language Reasoning

Add code
Dec 02, 2024
Viaarxiv icon

FastRM: An efficient and automatic explainability framework for multimodal generative models

Add code
Dec 02, 2024
Viaarxiv icon

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Add code
Dec 02, 2024
Viaarxiv icon

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Add code
Dec 02, 2024
Viaarxiv icon

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

Add code
Dec 02, 2024
Viaarxiv icon